13 research outputs found

    A study on map-matching and map inference problems

    Get PDF

    Trajectories know where map is wrong: an iterative framework for map-trajectory co-optimisation

    No full text
    The low map quality has been a persistent problem which is usually caused by the belated map update. Although the recent research on map inference/update enables timely map update through the use of trajectory data, the update quality is still far from being practically useful due to the trajectory inaccuracy. In this work, we propose an iterative map-trajectory co-optimisation framework which refines the traditional map inference/update results by considering their contribution to the quality improvement on both map and trajectory map-matching results. In each iteration, we propose two respective scores to measure the credibility and influence of each road update and refine the map and map-matching result accordingly. Meanwhile, we quantify the quality of map and trajectory-matching results so that the goal of our iterative co-optimisation is to maximise the overall quality result. Additionally, to accelerate the iterative process, we introduce an R-tree-based spatial index to avoid unnecessary map-matching. Overall, our framework supports most of the existing map inference/update methods and significantly improves the quality of their update result with affordable overhead. We conduct extensive experiments on real-world datasets of different scales. The results show the significant quality improvement over the state-of-the-art map update methods while the efficiency stays competitive

    An iterative map-trajectory co-optimisation framework based on map-matching and map update

    No full text
    The digital map has long been suffering from low data quality issues caused by lengthy update period. Recent research on map inference/update shows the possibility of updating the map using vehicle trajectories. However, since trajectories are intrinsically inaccurate and sparse, the existing map correction methods are still inaccurate and incomplete. In this work, we propose an iterative map-trajectory co-optimisation framework that takes raw trajectories and the map as input and improves the quality of both datasets simultaneously. The map and map-matching qualities are quantified by our proposed measures. We also propose two scores to measure the credibility and influence of new road updates. Overall, our framework supports most of the existing map inference/update methods and can directly improve the quality of their updated map. We conduct extensive experiments on real-world datasets to demonstrate the effectiveness of our solution over other candidates

    Integrating workload balancing and fault tolerance in distributed stream processing system

    No full text
    Distributed Stream Processing Engine (DSPE) is designed for processing continuous streams so as to achieve the real-time performance with low latency guaranteed. To satisfy such requirement, the availability and efficiency are the main concern of the DSPE system, which can be achieved by a proper design of the fault tolerance module and the workload balancing module, respectively. However, the inherent characteristics of data streams, including persistence, dynamic and unpredictability, pose great challenges in satisfying both properties. As far as we know, most of the state-of-the-art DSPE systems take either fault tolerance or workload balancing as its single optimization goal, which in turn receives a higher resource overhead or longer recovery time. In this paper, we combine the fault tolerance and workload balancing mechanisms in the DSPE to reduce the overall resource consumption while keeping the system interactive, high-throughput, scalable and highly available. Based on our data-level replication strategy, our method can handle the dynamic data skewness and node failure scenario: during the distribution fluctuation of the incoming stream, we rebalance the workload by selectively inactivate the data in high-load nodes and activate their replicas on low-load nodes to minimize the migration overhead within the stateful operator; when a fault occurs in the process, the system activates the replicas of the data affected to ensure the correctness while keeping the workload balanced. Extensive experiments on various join workloads on both benchmark data and real data show our superior performance compared with baseline systems

    A survey and quantitative study on map inference algorithms from GPS trajectories

    No full text
    Map inference algorithm aims to construct a digital map from other data sources automatically. Due to the labour intensity of traditional map creation and the frequent road change nowadays, map inference is deemed to be a promising solution to automatic map construction and update. However, existing map inference from GPS trajectories suffers from low GPS data quality, which makes the quality of the constructed map unsatisfactory. In this paper, we study the existing map inference algorithms using GPS trajectories. Different from previous surveys, we (1) include the most recent solutions and propose a new categorisation of method; (2) study how different types of GPS errors affect the quality of inference results; (3) evaluate the existing map inference quality measures regarding their ability to identify map quality issues. To achieve these goals, we conduct a comprehensive experimental study on several representative algorithms using both real-world datasets and synthetic datasets, which are generated from our proposed synthetic trajectory generator and artificial map generator. Overall, our study provides insightful observations regarding (1) which inference method performs better in each working scenario, (2) the general data quality requirements for map inference, (3) the direction of future works for quantitative map quality measures

    A performance study on large-scale data analytics using disk-based and in-memory database systems

    No full text
    With the significant increase in memory size, in-memory database systems are becoming the dominant way of dealing with large scale data analytics as compared to the traditional disk-based systems such as data warehouses. Due to the significant differences in both physical and logical designs, these two systems show totally different characteristics on massive data analytic workload. In order to address the difference and technical reasons behind, we contrast the performance between disk-based data warehousing and in-memory database systems by comparing two state-of-the-art commercial systems using a large-scale real transportation dataset. This independent performance study reveals several interesting insights. Experimental evaluation shows that the in-memory system can achieve competitive performance on most data analytics queries with less model maintenance cost and more flexibility, but it is not capable in other cases. We summarise the results of our study and provide guidelines on how to select an appropriate system for a given data analytics task

    Graph-based analysis of city-wide traffic dynamics using time-evolving graphs of trajectory data

    No full text
    This paper proposes a graph-based approach to representing spatio-temporal trajectory data that allows an effective visualization and characterization of city-wide traffic dynamics. With the advance of sensor, mobile, and Internet of Things (IoT) technologies, vehicle and passenger trajectories are increasingly being collected in massive scale and are becoming a critical source of insight into traffic pattern and traveller behaviour. To leverage such trajectory data to better understand traffic dynamics in a large-scale urban network, this study develops a trajectory-based network traffic analysis method that converts individual trajectory data into a sequence of graphs that evolve over time (known as dynamic graphs or time-evolving graphs) and analyses network-wide traffic patterns in terms of a compact and informative graph-representation of aggregated traffic flows. First, we partition the entire network into a set of cells based on the spatial distribution of data points in individual trajectories, where the cells represent spatial regions between which aggregated traffic flows can be measured. Next, dynamic flows of moving objects are represented as a time-evolving graph, where regions are graph vertices and flows between them are treated as weighted directed edges. Given a fixed set of vertices, edges can be inserted or removed at every time step depending on the presence of traffic flows between two regions at a given time window. Once a dynamic graph is built, we apply graph mining algorithms to detect change-points in time, which represent time points where the graph exhibits significant changes in its overall structure and, thus, correspond to change-points in city-wide mobility pattern throughout the day (e.g., global transition points between peak and off-peak periods)

    A restaurant recommendation system by analyzing ratings and aspects in reviews

    No full text
    Recommender systems are widely deployed to predict the preferences of users to items. They are popular in helping users find movies, books and products in general. In this work, we design a restaurant recommender system based on a novel model that captures correlations between hidden aspects in reviews and numeric ratings. It is motivated by the observation that a user’s preference against an item is affected by different aspects discussed in reviews. Our method first explores topic modeling to discover hidden aspects from review text. Profiles are then created for users and restaurants separately based on aspects discovered in their reviews. Finally, we utilize regression models to detect the user-restaurant relationship. Experiments demonstrate the advantages

    Random-based algorithm for efficient entity matching

    No full text
    Most of the state-of-the-art MapReduce-based entity matching methods inherit traditional Entity Resolution techniques on centralized system and focus on data blocking strategies for structured entities n order to solve the load balancing problem occurred in distributed environment. In this paper, we propose a MapReduce-based entity matching framework for Entity Matching on semi-structured and unstructured data. Each entity is represented by a high dimensional vector generated from description data. In order to reduce network transmission, we produce lower dimensional bit-vectors called signatures for those entity vectors based on Locality Sensitive Hash (LSH) function. Our LSH is required for promising cosine similarity. A series of random algorithms are designed to ensure the performance for entity matching. Moreover, our design contains a solution for reducing redundant computation by one round of additional MapReduce job. Experiments show that our approach has a huge advantages on both processing speed and accuracy compared to the other methods

    SEMI: a Scalable Entity Matching system based on MapReduce

    No full text
    MapReduce framework provides a new platform for data integration on distributed environment. We demonstrate a MapReducebased entity resolution framework which efficiently solves the matching problem for structured, semi-structured and unstructured entities. We propose a random-based data representation method for reducing network transmission; we implement our design on MapReduce and design two solutions for reducing redundant comparisons. Our demo provides an easy-to-use platform for entity matching and performance analysis. We also compare the performance of our algorithm with the state-of-the-art blocking-based methods
    corecore